IPython Interpreter And Jupyter Notebooks
As Python is an interpreted language, it is required for an interpreter to run a program by executing a single statement at a time. IPython is an interpreter designed for both interactive computing and software development, while encouraging an execute-explore workflow instead of the typical edit-compile-run workflow of other programming languages. In addition, it provides directly integrated access to the shell and filesystem of the operating system (removing the need to switch between the current session and terminal). Juypter is an initiative to design language-agnostic interactive computing tools and allows for IPython to be used as a kernel for using Python with Jupyter. With regard to data analysis, IPython and Jupyter are essential in allowing for efficient exploration, interaction, testing, debugging, and iteration. These notes rely on the ideas and learnings from the respective package documentations, "Python For Data Analysis: Data Wrangling With Pandas, NumPy, And Jupyter", 3rd Edition, by Wes McKinney (creator and developer of Pandas) in 2022, and "Python Data Science Handbook: Essential Tools For Working With Data", 2nd Edition, by Jake VanderPlas in 2022.
IPython Shell System
IPython can be seen as an enhanced Python interpreter which offers additional features relative to the standard Python interpreter and IDLE (Integrated Development And Learning Environment). In more detail, an IPython shell and kernel has the comprehensive object introspection; input history which is persistent across sessions; caching of output results during a session with automatically generated references; extensible tab completion with support for completion of variables, functions, arguments, keywords, and file names; extensible system of commands for controlling the environment and performing many tasks related to IPython or the operating system; rich configuration system with the ability to switch between different setups; session logging and reloading; extensible syntax processing for special purpose situations; access to the system shell with user-extensible alias system; integrated access to the debuggers and profilers; and creation of rich display of HTML, images, sounds, videos, and LaTeX. It should also be noted that the IPython shell will usually render text with syntax highlighting for improved readability.
...
~ $ ipython
~ $ ipython In [1]: %run file.py
In [1]: variable?
In [1]: function?
In [1]: numpy.*load*?
In [1]: pandas.*read*?
Configuration...
...
%timeitJuypter Notebook System
A primary component of the Juypter project is the notebook system. The notebook system provides a means for creating rich and interactive documents by allowing for the authoring of content in HTML or Markdown alongside source code, data visualizations, and other outputs. A notebook interacts with kernels, which are implementations of the Jupyter computing protocol specific to different programming languages. The Python Juypter kernel uses the IPython system for its underlying behaviour through ipykernel
(currently support for over 40 programming languages). Although usually used as local computing environments, a notebook can also be deployed on servers and accessed remotely.
When creating a notebook, a local server will be started to host the notebook. A new notebook can then be created by visiting the URL of the server. When a notebook is saved, all of the content is stored including any evaluated code output in a self-contained file format as .IPYNB. The notebook can be edited in the native web-based interface from a browser or there are various integrated development environments which can be used for additional features, such as Spyder, Visual Studio Code (with extensions for Python and Jupyter), and JupyterLab. It should be noted that many integrated development environments, such as Spyder and Visual Studio Code, can directly open a notebook from an .IPYNB file without manually starting a server.
~ $ pip install ipykernel ~ $ pip install notebook
~ $ pip install jupyter
~ $ pip install jupyterlab
~ $ jupyter notebook
Configuration...
Screenshots of native, Spyder, VS Code, and JupyterLabTuples, Lists, Dictionaries, Sets
From the built-in data structures, the most frequently used sequence types include tuples, lists, dictionaries, and sets. A tuple is a fixed-length and immutable sequence of objects which cannot be modified once assigned - in other words, it is not possible to modify which objects are stored in each slot of a tuple (although the objects within the slots may be modified if they are mutable). A list is a variable-length and mutable sequence of objects which can be modified once assigned. A dictionary (possibly referred to as a hash map or associative array in other programming languages) stores a collection of key-value pairs, where the keys and their associated values are objects (although the keys generally have to be immutable objects like scalar types (strings, integers, or floats) or tuples (only containing immutable objects) for hashability). A set is an unordered collection of unique objects (although the objects generally have to be immutable objects like scalar types (strings, integers, or floats) or tuples (only containing immutable objects) for hashability). Each type has additional methods for expansion, such as indexing, concatenating, sorting, finding sizes, counting occurrences, appending objects, inserting objects, removing objects, or set operations.
variable_tuple = (0, 1, 2, 3.1415, "Example", True, False, ("X", 3, None), ["Y", range (10)])
variable_list = [0, 1, 2, 3.1415, "Example", True, False, ("X", 3, None), ["Y", range (10)]]
variable_dictionary = {"A": 0, "B": [True, False], 1: (None, 1), ("Key", True): ["Example", 3.1415]}
variable_set = {0, 1, 2, 3.1415, "Example", True, False, ("X", 3, None), ("Y", range (10))}
There are also several useful sequence functions which include enumeration, sorting, zipping, and reversing. Enumeration is often used to allow the index in a sequence type to be tracked in a for-loop along with the values of the object being iterated. Sorting is used to create a sorted list of the values in an object, where the values are ...distributed... alphanumerically. Zipping pairs up the values of a number of objects to create a new list of tuples with the associated pairs in each tuple (number of elements determined by the shortest object). Reversing creates a generator to iterate over the value of an object in reverse order. The use of comprehensions can also be helpful for concisely forming new objects by filtering and performing an operation on the values of a sequence type (list, dictionary, or set). Alternatively, the map
function can be used in a similar manner to comprehensions (without capabilities for filtering).
variable_list = [value.upper () for value in collection if len (value) > 8]
variable_dictionary = {value: index for index, value in enumerate (collection) if value < 7}
variable_set = {value.count () for value in collection if value [0] == "A"}
Virtual Environments
Since it is often necessary to use packages and modules which do not come with the standard library, it can be helpful to isolate the environment for the specific project, such that packages and modules can be managed at specific versions. This can be done through a virtual environment, which is a self-contained directory tree containing an installation for a particular version of Python and additional packages and modules. In other words, a virtual environment allows for a cooperatively isolated runtime environment which allows users and applications to install and upgrade packages and modules without interfering with the behaviour of other users and applications running on the same system. Activating a virtual environment will prepend the directory to PATH
, so that running a script will invoke the interpreter of the environment and installed scripts can be run without having to use their full paths.
.venv
):~ $ python3 -m venv path/to/Environment
~ $ source path/to/Environment/bin/activate
(Environment) ~ $ deactivate
~ $ python3 -m venv --upgrade path/to/Environment
Once activated, packages and modules will only be installed relative to the virtual environment. For convenience, a requirements list can be included at the root of the project as ./requirements.txt
. This file should contain the relevant packages and modules with their associated versions for the project. To upgrade a package or module, the version can be updated in the requirements list and re-installed to implement the changes. If the packages and modules have already been installed for a project, a requirements list can simply be created from these existing packages and modules.
# Web Framework Flask==2.0.2 # Database ORM SQLAlchemy<=1.4.25 # Data Analysis pandas==1.3.3 numpy==1.21.2 # Data Visualization matplotlib==3.4.3 seaborn==0.11.2 # Machine Learning scikit-learn==0.24.2 tensorflow==2.7.0 pytorch==1.9.1 # Authentication bcrypt>=3.2.0 pyjwt>=2.3.0 # Testing pytest==6.2.4 coverage==6.2.2
requirements.txt
:(Environment) ~ $ python -m pip install --requirements requirements.txt
(Environment) ~ $ python -m pip install --requirements requirements.txt --force-reinstall
(Environment) ~ $ python -m pip freeze > requirements.txt/pre>
If Spyder is used for development, an option is to activate the virtual environment and then install Spyder through Pip. It is then possible to launch Spyder from within the virtual environment, after which the interpreter for IPython needs to be edited to point to the particular version of Python for the virtual environment. Alternatively, to avoid installing a version of Spyder in each virtual environment and allow for flexibility and configurability, a modular approach can be followed, where Spyder can be installed in the base environment, necessary kernels can be installed in the virtual environment, and then the path to the interpreter for IPython can be edited in preferences to point to the particular version of Python for the virtual environment (although this will need to be edited each time the virtual environment is changed).
If Visual Studio Code is used for development, the path to the interpreter for IPython can be edited in preferences to point to the particular version of Python for the virtual environment (although this will need to be edited each time the virtual environment is changed). Conveniently, this can be associated with the current workspace folder.
~ $ source path/to/Environment/bin/activate (Environment) ~ $ pip install spyder-kernels (Environment) ~ $ python3 -c "import sys; print(sys.executable)" (Environment) ~ $ deactivate Preferences > Python Interpreter > Use The Following Interpreter
~ $ source path/to/Environment/bin/activate (Environment) ~ $ python3 -c "import sys; print(sys.executable)" (Environment) ~ $ deactivate Command Palette > Python: Select Interpreter